Progress of Speech Recognition using the Corpus of Spontaneous Japanese (CSJ)
نویسنده
چکیده
The report gives an overview of the current state of spontaneous speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effective, and we have achieved word accuracy of 78.0%, which is a significant improvement over a couple of years.
منابع مشابه
Corpus of Spontaneous Japanese: Its Design and Evaluation
Corpus of Spontaneous Japanese, or CSJ, is a large-scale database of spontaneous Japanese. It contains speech signal and transcription of about 7 million words along with various annotations like POS and phonetic labels. After describing its design issues, preliminary evaluation of the CSJ was presented. The results suggest strongly the usefulness of the CSJ as the resource for the study of spo...
متن کاملTraining a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition
This paper describes a language modeling method using largescale spoken language data retrieved from the Web for spontaneous speech recognition. We downloaded 15 million Web pages on a comprehensive range topics. Next, spoken languagelike texts were selected from the downloaded Web data using the naı̈ve Bayes classifier, and typical linguistic phenomena such as fillers and pauses were added usin...
متن کاملDependency-structure Annotation to Corpus of Spontaneous Japanese
In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus in Japanese, based on a dependency grammar. In the same way, the syntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), is represented by dependency relationships between bunsetsus. This paper descri...
متن کاملA Corpus-based Analysis on Prosody and Discourse Structure in Japanese Spontaneous Monologues
The aim of this paper is two folds. First, the paper attempts to investigate prosody and discourse structure in Japanese spontaneous monologues by using the prosodic labels of the Corpus of Spontaneous Japanese (CSJ). The analyses of F0 peak trends and prosodic breaks confirmed previous findings in [1]. Secondly, the paper attempts to evaluate the validity of prosodic labels of the X-JToBI syst...
متن کاملBenchmark Test for Speech Recognition Using the Corpus of Spontaneous Japanese
We present benchmark results of automatic speech recognition using the Corpus of Spontaneous Japanese (CSJ), which has been developed in the five-year national project and will be the largest spontaneous speech databases. New test-sets are designed for both academic presentation speech and extemporaneous public speech, which are the two major categories in the corpus. The testsets are selected ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004